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We present a new criterion for the goodness of global fits. It involves an exploration 
of the variation of x 2 lor subsets of data. 



1 Introduction 

This talk addresses the questions of quantifying how good global fits are, and 
of how we would know a theory is wrong; it summarizes our Ref. El. 

The obvious criterion is that of hypothesis testing: \ 2 — N i V2N for a 
good fit with N degrees of freedom. One should also apply the same criterion 
to subsets of data (e.g., from a particular experiment or reaction), for which 
the normal range is xj — Ni± \/2Ni. 

In fact, a much stronger criterion applies. The idea was discovered by 
contrasting the criteria for a one-standard-deviation effect in hypothesis test- 
ing and in parameter fitting. When fitting a single parameter, the one-sigma 
range of the parameter is found by increasing x 2 one unit above its minimum. 
But the fit is good (hypothesis-testing) at the one-sigma level if Xmin ^ s within 
\/2N of its normal value N. 

What is shown in Ref. El is that the goodness of a global fit is better 
tested by applying the parameter-fitting criterion in a certain way to subsets 
of data. This can be much more stringent than the obvious hypothesis-testing 
criterion. 
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Table 1. Hypothetical comparison of data and pdf's. 





TEV 


HERA 


Total 


PDF 


(100 pts) 


(100 pts) 


(200 pts) 


CTEQ 


85 


115 


200 


MRST 


115 


85 


200 



2 Scenario 

Suppose we have two pre-existing good global fits of parton densities, called 
CTEQ and MRST, and that new data arrive from two experiments, TEV and 
HERA. Assume that the x 2 s are as in Table [j], so that by the hypothesis- 
testing criterion, each set of pdf's gives a good fit to each experiment. 

In fact we may have a bad fit. This can be seen by constructing pdf's 
that interpolate between CTEQ and MRST, f p = p/ CTE Q + (1 - p)/ MRST , 
and then fitting the interpolating parameter p. If, for example, we have 
X | EV = 85 + 30(1 - p) 2 and Xhera = 85 + 30 P 2 > tnen tnc TEV data implies 
p = l± 0.18, while the HERA data implies p = ± 0.18. 

By converting the problem to one of parameter fitting, we have found 
that the theory and experiments are mutually inconsistent in this case, by 
about 4(7. If the forms for \ 2 are different, it is possible to have a good fit 
to both experiments, but only if neither CTEQ nor MRST fit the data. The 
decreases of 30 in x 2 between the CTEQ and MRST values are sufficient to 
show that at least one of these situations arises. 

3 General procedure and application to CTEQ5 

Observe that in the hypothetical scenario, CTEQ alone obtains a good x 2 ) 
and we only saw a problem when we brought in MRST. How can CTEQ alone 
determine that there is a problem without MRST's assistance, and vice versa? 
And how can this be done without knowing which of ~ 30 parameters is the 
important one? n 

The procedure we propose u is as follows: 

• Take pre-determined subsets of data (e.g., experiments). 

• Explore a region of parameter-space with x 2 ~ Xmin U P to about y/2N, 
e.g., 50 for CTEQ/MRST, and find the minimum of \\ f° r each subset. 

• If min(x 2 ) — x 2 (global fit) is less than a few units, then experiment i 
disagrees with global fit. 
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Figure 1. Application of our method to the 8 data sets that have the lion's share of the 
data points used in the CTEQ5 El analysis. The values of A\ 2 plotted are the deviations 
of the x 2 from the values at the overall best fit. 



Table 2. Values of \ 2 for data contributing to the CTEQ5 fit. 



Expt 


iV 


x 2 


X 2 /N 


X 2 -N 
\/2N 


1. NMC D/H 


123 


111 


0.90 


-0.8 


2. E605 


119 


92 


0.77 


-1.8 


3. HI F 2 '96 


172 


108 


0.63 


-3.4 


4. NMC H 


104 


108 


1.04 


0.3 


5. ZEUS F 2 '94 


186 


249 


1.34 


3.3 


6. BCDMS H 


168 


146 


0.87 


-1.2 


7. BCDMS D 


156 


222 


1.42 


3.7 


8. CCFRF 2 


87 


74 


0.85 


-1.0 



• In computing significance, allow for the number of experiments and the 
effective number of parameters determined by each subset of data. 

The minimization of xf f° r a given Xtotai can ^ e rea dily implemented by 
a Lagrange multiplier method: For each value of a parameter A, minimize 



Xtot(p) + — l)xf (p)- The result gives a curve for xf as a function of Xtof 
The results of applying this procedure to the CTEQ5 fit are shown in Fig. 
|l|. Several of the data sets can be seen to be bad fits, notably from CCFR 
and BCDMS. The criterion is that the x 2 for the subset of data decreases by 
too many units as Xtot increases. A small decrease is normal and expected. 
The bad fit happens even though nothing exceptional happens according to 
the hypothesis-testing criterion, as can be seen in the last column of Table ^. 

MRST's plots of x 2 against a s (M z ) in Fig. 21 of Ref. u independently 
confirm our physics conclusion, that current data and QCD theory (including 
the approximations of NLO calculations, neglect of nuclear target and higher- 
twist corrections, etc.) are not in good agreement. 



4 Conclusions 



We have shown that the quality of a global fit is correctly determined by 
testing the variation of x 2 (subset) for subsets of data as parameters are varied. 
A substantial decrease is a symptom of a bad fit, and the parameter- fitting 
criterion is the correct one here. With this method small data sets do not get 
lost compared with the other ~ 1000 points. The current CTEQ5 global fit 
appears not to fit the data, and the same appears to apply to the MRST fit. 

Statistical analysis alone cannot tell us the explanation of this inconsis- 
tency. Only a physics-based analysis can decide if the problem is in one of 
the experiments, if there is a technical error in a theory calculation, or if 
there is really new physics that has been measured. The statistics only give 
a diagnosis of where further investigation will be most useful. 
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